Transfer learned deep feature based crack detection using support vector machine: a comparative study

Technology offers a lot of potential that is being used to improve the integrity and efficiency of infrastructures. Crack is one of the major concerns that can affect the integrity or usability of any structure. Oftentimes, the use of manual inspection methods leads to delays which can worsen the situation. Automated crack detection has become very necessary for efficient management and inspection of critical infrastructures. Previous research in crack detection employed classification and localization-based models using Deep Convolutional Neural Networks (DCNNs). This study suggests and compares the effectiveness of transfer learned DCNNs for crack detection as a classification model and as a feature extractor to overcome this restriction. The main objective of this paper is to present various methods of crack detection on surfaces and compare their performance over 3 different datasets. Experiments conducted in this work are threefold: initially, the effectiveness of 12 transfer learned DCNN models for crack detection is analyzed on three publicly available datasets: SDNET, CCIC and BSD. With an accuracy of 53.40%, ResNet101 outperformed other models on the SDNET dataset. EfficientNetB0 was the most accurate (98.8%) model on the BSD dataset, and ResNet50 performed better with an accuracy of 99.8% on the CCIC dataset. Secondly, two image enhancement methods are employed to enhance the images and are transferred learned on the 12 DCNNs in pursuance of improving the performance of the SDNET dataset. The results from the experiments show that the enhanced images improved the accuracy of transfer-learned crack detection models significantly. Furthermore, deep features extracted from the last fully connected layer of the DCNNs are used to train the Support Vector Machine (SVM). The integration of deep features with SVM enhanced the detection accuracy across all the DCNN-dataset combinations, according to analysis in terms of accuracy, precision, recall, and F1-score.

frequent manual inspections impractical.Efficiently identifying surface cracks within a specific timeframe is crucial for enhancing the maintenance protocols of buildings.This swift detection allows for timely interventions, preventing the deterioration of structural issues and minimizing repair costs.By promptly addressing these cracks, potential safety hazards can be mitigated, ensuring the longevity and structural integrity of the building.Recent advancements in science and technology have led to the development of automatic crack detection models, employing image processing and machine learning (ML) techniques [3][4][5][6] .
Image processing-based techniques use statistical features from structural images to detect and locate cracks, treating them as regions with sudden pixel intensity changes.Machine Learning (ML)-based models utilize hand-crafted features, such as edge, texture, and color, for automatic crack detection 7,8 .With the availability of massive datasets, researchers have turned to Deep Learning (DL), particularly Convolutional Neural Networks (CNN), for more effective crack detection.The success of DL-based models, especially neural networks with multiple layers, has significantly improved feature learning.CNNs, with varied filters highlighting crucial features, extract basic image features in initial layers and advanced, crack-specific features in deeper layers.These features are then passed to a multi-layer perceptron classifier for crack detection.The accessibility of powerful computing resources and continuous advancements in training techniques on readily available datasets propel the rapid development of deep learning.Despite the success in feature extraction, there's a need to enhance the accuracy of these models in detecting concrete cracks.
In this research, we put forth a method of transfer learning-based deep convolutional neural networks (DCNN) with the pre-trained weights as a classifier and feature extractor, which exhibits a considerable increase in terms of performance, unavailability of large dataset and training time.This paper also investigates the impact of ML classifiers learned over deep features for crack detection.Three publicly available datasets were used for the study SDNET2018 9 , Concrete Crack Images for Classification (CCIC) 10 , and Bridge Crack Dataset (BCD) 11 .Experiments conducted in this work are threefold (1) Crack detection based on transfer learned deep CNNs: 12 state-of-the-art CNN models transfer learned on ImageNet were used to classify the crack images (2) Crack detection using transfer learned CNNs on enhanced crack images (3) Examining DCNN's performance as a feature extractor.

Analysis of the effectiveness of image enhancement techniques such as contrast enhancement and Local
Binary Pattern (LBP) pre-processing on transfer learned DCNN models for crack detection.

Development of Support Vector Machine (SVM)-ML-based classification model on deep features extracted
from the aforementioned DCNN models.
The following is how the paper is organized: The related crack detection research is covered in Section "Literature review" of this paper.The proposed system and the experiments carried out to categorize the images are described in Section "Proposed Methodology".A description of the various datasets used is provided in Section "Dataset".The outcomes and conclusions of the experiment are described in Section "Experimental Result and Analysis".The paper is concluded in Section "Conclusion and Future Scope".

Literature review
A thorough description of the most recent crack detection models is provided in this section.Crack detection models found in the literature can be divided into three major categories based on their workflow: (1) Models based on traditional image processing algorithms (2) Models based on machine learning models (3) Models based on deep learning models.

Classical image processing-based models
Crack detection using image processing methods have three major steps: image acquisition, pre-processing, and crack detection 12 .The target component is first photographed in high quality with a camera or any other imaging instrument.The next step in the pre-processing is to eliminate noise and shadows from the images by applying filters, segmentation, and other techniques.If necessary for the particular crack detection technique being used, the image may be transformed to gray-scale or binary format.The generated image is then put through crack detection, which emphasizes or segments the image's cracked area using image processing techniques like edge detection, segmentation, or pixel analysis. 13.Lins et al. 14 , developed a method to identify cracks using several color models like HSV (Hue-Saturation-Value) and RGB (Red-Green-Blue).They proposed a color feature extraction model, which searches for certain color compositions in an image in comparison to a standard query color.Further, the authors have used their crack measurement algorithm to measure the length and width of the detected cracks.Shahrokhinasab et al. 15 , analyzed various image processing methods like edge detection, and thresholding, to classify cracks.Munawar et al. 16 , analyzed different methods of fissure detection including genetic programming, beamlet transformation, Unmanned Aerial System based approach, and the Shi Tomasi algorithm.Zou et al. 17 , introduced an automated crack detection system titled as CrackTree which uses a geodesic shadow removal algorithm to eliminate shadows from pavement images.
A crack probability map is produced using tensor voting, and a graph model is built by choosing crack seeds from the crack probability map.Recursive edge pruning in the graph's Minimum Spanning Tree (MST) is used to find the final crack curves.Gabor Filters were employed by Salman et al. 18 , for crack detection.Niu et al. 19 ,

Machine learning-based models
ML-based models for crack detection follow five steps: dataset collection, pre-processing of images, feature extraction, model training on the extracted features, and testing.Landstrom and Thurley 23 employed morphological operators to slice the cracks from the image and logistic regression is used to distinguish the crack/noncrack images using the segmented images.Prasanna et al. 24 , put forth a crack detection method called spatially tuned robust multi-feature (STRUM), in which the authors have explored classifiers including SVM, AdaBoost, and Random Forest.Lin et al. 25 , used hidden Markov random field-expectation-maximization (HMRF-EM) for automatic pavement crack detection, with 2 major modules.Firstly, the hidden Markov random field model and its expectation-maximization are combined with the adaptive line detector to increase detecting accuracy.Secondly, the integrity and continuity of the detected cracks are improved by the quantitative description of the crack region's credibility and conditional connection.FG Pratico et al. 26 , provided a method for classifying the structural health condition of several vibro-acoustically different road pavement cracks (concealed bottom-up cracks) using supervised machine learning techniques.The technique intends to gather the signatures (using roadside acoustic sensors) and categorize the structural health status of the pavement using ML models.They compared various ML classifiers, including the random forest classifier (RFC), support vector classifier (SVC) and multi-layer perceptron (MLP).Results indicate the SVC is the best-performing ML model with an accuracy of 99.1%.Zhang et al. 27 , suggested a new method for identifying surface fractures in coal mining sites using Unmanned Aerial Vehicle(UAV) imagery and ML.
The overall accuracy was increased to 88.99% by applying the V-SVM classifier.The authors also used Laplace sharpening to improve the color of the images and Principal Component Analysis (PCA) to minimize the entire set of features to 95% of the initial variance.A ML-computer vision pipeline was proposed by Zhang et al. 28 for detecting the formation of fatigue cracks.Cracks were detected using an ML model, and vision-based algorithms were further utilized to examine the growth direction and length of the fatigue crack.The primary problem with ML-based models for crack detection is the selection and extraction of relevant features for the classifier's training.

Deep learning-based models
Numerous crack detection models have been developed in the literature as a result of recent developments in deep learning (DL), particularly the evolution of convolutional neural networks (CNNs) 29 .DL-based models for crack detection follow steps analogous to ML-based models described above.The major difference is that DL models do feature extraction implicitly.A dataset of surface cracks must be gathered first to train the DL model.To minimize noise, eliminate shadows, and modify other features like image size and brightness, the images are then pre-processed using image processing techniques.These images are then subjected to pixel-by-pixel annotation, or labeling, where the pixels corresponding to cracks are annotated either manually or by using annotation tools.One example of labeling is making the remaining pixels in the image black or "0" and the crack pixels white or "1" in the image.Following this, a DL architecture CNNS must be chosen to be applied to crack detection.Li et al. 30 , proposed a deep neural architecture with a convolutional block, four dense connections, five deep supervision modules, three conversion modules and one fusion module to identify cracked surfaces.Zhang et al. 31 , introduced a CNN architecture with four convolutional layers and two fully connected layers.Their convolution network achieved a precision of 0.869 and a recall of 0.925 for crack detection.Meng et al. 32 , proposed a deep residual neural network-based concrete crack identification method that identified concrete crack images at the pixel level.Transfer learned EfficientNetB0 was employed by C.Su and W. Wang 33 for crack detection.They reported an accuracy better than that of a fully convolutional network proposed by Ye et al. 34 , which gave an accuracy of 93.6%.Feng et al. 35 , used transfer learning on the InceptionV3 model to classify cracks which included crack, intact, spalling, seepage and rebar exposure as the classes.A custom convolutional neural network with three convolutional layers was introduced by Kim et al. 36 for crack detection.The images were pre-processed using morphological filters and contrast enhancement operators, which in turn were used to train the CNN model for the identification of cracks.Cao et al. 37 , used object-detecting paradigms such as faster RCNN and SSD models along with MobileNet, Inception, Resnet, and Inception Resnet to detect road cracks.They used mAP(mean average precision) as the performance metric to test the combinations of Object detecting paradigms and DCNNs.Among all the combinations, Faster RCNN paired with Inception V2 gave the best results with mAP at 53.06%.A two-stage detection model including a DCNN and a segmentation module was proposed by NHT Nguyen et al. 38 .The authors proved that the segmentation of cracks at the pixel level improves www.nature.com/scientificreports/detection accuracy significantly.In a study presented by SE Park et al. 39 , cracks on concrete structure surfaces have been identified using DL and structured light technologies, which combine two laser sensors with vision.
The YOLO model was used to identify the cracks and the size of all cracks were calculated using the positions of the laser beams on the structural surface.Huyan et al. 40 , presented a model named CrackU-net which detects pavement cracks with a precision of 0.986.Kim et al. 41 , proposed a crack detection technique using shallow CNN architecture.They optimized the LeNet-5 model's hyper-parameters to obtain maximum accuracy of 99.8% with fewer parameters.Even though some of these models performed pretty well in feature extraction and classification on various applications, their accuracy needs to be increased to detect concrete fractures.In this paper, we are evaluating the effectiveness of transfer-learned deep features for crack detection using raw and enhanced crack images, which shows a significant boost in terms of performance.

Proposed methodology
This section introduces DCNNs and their application for crack detection in detail.DCNNs, which were first developed in the 1980s, is the most well-known, advanced, and popular DL algorithm 42 .Earlier the researchers were not drawn to DCNNs due to the availability of minimum computational resources, powerful processors, and huge storage devices.But when computers' processing capacity for computing, database retrieval, and storage expanded, the idea gained popularity 43 .Later in 44 , CNN's were successfully applied in classification problems and outperformed mostly in solving computer vision problems.Figure 1 depicts a typical CNN structure.The initial layers of DCNN extract basic image features such as edges, patterns, and textures.The middle layers extract object-level information like shape and color, whereas the deeper levels extract class-level features like the whole object.The feature extraction layer's final output is passed into either a fully connected neural network 45 for classification or a bounding box and pixel classification layer for segmentation.
CNN has emerged as the most widely used and successful DL architecture for various input data types including images, videos and texts, with several cutting-edge architectures reported in the literature.VGG16, VGG19 46 , Xception 47

Transfer learning for crack detection without image enhancement
A model created for one job is utilized as the basis for another task in transfer learning, a machine learning technique [55][56][57] .The use of pre-trained models as the foundation for computer vision and natural language processing tasks is a common strategy in DL research due to the massive computing time and resources required to develop neural network models 58 .The benefits of using a transfer learned model over an end-to-end neural network include significant time and computation savings.Recent research reveals that transfer-learned models outperform traditional neural networks and can work with smaller amounts of data.Generally, for computer vision applications, the features extracted by the first and middle layers of a neural network are similar for similar inputs.The latter layers that extract high-level features make the difference.The proposed model freezes the first and middle layers and makes the final layers trainable.We retain the weights from the old model trained on a comparatively large dataset and only train a few parameters.
Figure 3 illustrates the process of transfer learning applied to a Deep Convolutional Neural Network (DCNN) using pre-trained ImageNet weights.In this experiment, we adapted the DCNN model for crack detection by leveraging the weights learned from the ImageNet dataset.
To accomplish this, we first removed the final layers of the pre-trained models.These layers were then replaced with a new architecture consisting of several components: a flattened layer to convert the 2D feature maps into   The transfer learning model was then trained, but with a specific focus on optimizing only a subset of parameters.Specifically, most parameters from the pre-trained layers were frozen, meaning they were not updated during training.Only the parameters from the newly added custom layers were fine-tuned.This approach allows the model to retain the general features learned from the ImageNet dataset while adapting its final layers to the specific task of crack detection with a smaller amount of data and computational resources.

Transfer learning for crack detection with image enhancement
Two image enhancement methods: Local Binary Pattern and contrast enhancement were employed to pre-process the input image to train the DCNN models.Image enhancement modules were introduced with the assumption that when trained on enhanced input images, Convnets would easily converge, lowering computational costs and improving accuracy.The assumption was supported further by various benchmark evaluation metrics, as shown in the following sections.The selection of image enhancement algorithms was done based on the literature as proposed by Wang et al. 59 , and Chen et al. 60 .

Contrast enhancement
Contrast enhancement in the image makes dark areas darker and light areas lighter, making cracks appear darker than other surfaces.This creates a significant difference between the dark and light areas, which will aid in subsequent classification 59 .
Algorithm 1 details the steps followed for contrast enhancement and Fig. 4 shows the results of contrast enhancement on crack images selected randomly from the dataset.Figure plots the histograms corresponding to the original images and the contrast-enhanced images.From the figure, it is evident that the histograms of original crack images are not uniform (skewed towards the right) whereas that of enhanced images are uniform.

Local binary pattern (LBP)
LBP is a primitive texture operator that labels pixels in an image by thresholding each pixel's vicinity based on the current pixel 61 .It is considered an efficient descriptor due to its resistance to changes in illumination, computational simplicity, and reliability in image classification.The LBP Algorithm divides the image into smaller cells and uses the intensity of the center pixel as a threshold for the remaining pixels in the cell.When neighboring pixels are greater than the threshold value, they are thresholded to 1; otherwise, they are thresholded to 0. The binary number is generated by circularly visiting the matrix.As a result, the formed binary number is converted to a decimal and used to update the value of the center pixel.The LBP feature descriptor is mathematically represented as follows: where R is the radius and P denotes the pixels adjacent to it.c p is the center pixel's grayscale value, and n p is the grayscale value of the neighboring pixel.The LBP algorithm is detailed in Algorithm 2. Figure 5 compares results obtained from the image enhancement module (Contrast enhancement and LBP pre-processing) for random images from SDNET 8 .From Fig. 5, it is evident that the crack regions are more highly visible in the contrastenhanced images than in the original and LBP pre-processed images.
Although the LBP operator attempted to get hold of the underlying texture of the input image, it was unable to highlight the cracked regions.The same is demonstrated by experimental results in terms of model accuracy www.nature.com/scientificreports/ on contrast-enhanced images and LBP pre-processed images as shown in Section "Transfer Learning for Crack Detection with Image Enhancement".

Crack detection using ML models based on deep features from DCNNs
The effectiveness of deep features extracted from DCNN for classification is described in this section.The generic CNN architecture comprises a wide range of filters, pooling operators (Max pooling, Average Pooling), and nonlinear activations (ReLu, Sigmoid, Softmax).The filters are learned in either a supervised or unsupervised manner and extract relevant information from the input image.The pooling layers reduce the spatial dimension of the intermediate feature maps from convolution layers, and the activations introduce nonlinearity.Initial layers of DCNNs extract basic image features such as edges, textures, color etc. whereas the deeper layers extract complex class-specific features such as weights.This work proposes to use the weights learned by the deep layers of CNN as the feature representation for the input images, also known as Deep Features.Pre-trained CNN models including VGG16, VGG19, ResNet50, MobileNet, etc. were employed to extract the deep feature vectors to model the high-level representation of inputs.The extracted deep feature vectors are then fed into an ML algorithm like SVM for further classification as depicted in Fig. 6.The choice of using deep feature representation for the classification using ML models is based on the assumption that ML models can produce accurate results when trained on good feature representation, and deep features extracted from the final layers of DCNNs can generate high-level representations, implying a symbiotic relationship.

Dataset
This section details the dataset used for the experiment.Three publicly available datasets were used for the study SDNET2018 9 , Concrete Crack Images for Classification (CCIC) 10 and Bridge Crack Dataset (BCD) 11 .We have formatted the dataset to have equal data points in all classes.However, class imbalance [69] can result in different results.

SDNET dataset
The SDNET dataset includes 56,092 images of cracked and non-cracked bridges, pavement, and wall surfaces.Images of bridge decks were obtained from the Systems, Materials, and Structural Health (SMASH) Laboratory at Utah State University, which houses a variety of full-scale bridge deck sections.Images of walls and pavements were taken on the premises of the Utah State University campus.All of the images are 256 × 256 pixels in size    www.nature.com/scientificreports/14 non-cracked images.The details of the count of crack and non-crack images of the 3 datasets are provided in Table 2. Since all these datasets are quite large, we conducted the experiments with a smaller number of images from each of them.Table 3summarizes the train and validation split of the images used for experiments for the three datasets and Fig. 7 shows sample images from the three datasets.

Hardware and software specifications
The models were implemented on Google Colaboratory and Jupyter notebook with the Machine Learning and Deep Learning packages.The hardware specifications used for the experiments are listed in Table 4.

Performance measures
Accuracy, sensitivity, specificity, precision, recall, F1-score, and training duration were used to assess each model's performance.The confusion matrix, which is used to determine the model's overall performance and is displayed in Table 5, is utilized to calculate the performance metrics shown below.

Accuracy
Number of predictions made correctly by the model concerning the total predictions made.

Precision
Measure of quality of how good the model is at predicting a particular category.

Recall or sensitivity
The proportion of Positive samples that were correctly identified as Positive to all of the Positive samples.

F1-Score
The harmonic mean of precision and recall are given by: Table 5. Confusion matrix.

Predicted Class
Positive (1) Negative ( 0 The study and findings of transfer learning without image enhancement are covered in this subsection.Using the ImageNet weights from the pre-trained model reduced the number of parameters that needed to be trained in this experiment.A flattened layer, batch normalization layer, dropout layer, and a dense layer with two neurons and a sigmoid as an activation function were added instead of the top layers of all the pre-trained models to achieve this.The transfer learning method's fine-tuned hyper-parameters are tabulated in Table 6.Table 7 summarizes the precision, recall and F1 score obtained for the transfer of learned DCNNs on three publicly available datasets under study.From Table 7, it is evident that all the transfer-learned DCNNs perform poorly on the SDNET dataset compared to CCIC and BCD in terms of the three benchmark evaluation metrics under consideration.Based on this observation SDNET dataset was considered for the second experiment on transfer learned DCNNs using enhanced crack images.

Transfer learning for crack detection with image enhancement
Images from the SDNET dataset were pre-processed using image enhancement algorithms, and the improved images were used to transfer and learn the DCNNs.Contrast enhancement and texture feature analysis using the LBP operator were employed to enhance the crack images.The transfer learnt models were then trained using the improved images.EfficientNetB0 achieved the highest test accuracy of 65.10% on contrast-enhanced images (an improvement of 16.8%), whereas a test accuracy of 41.20% was achieved by MobileNetV2.The model that fared the best among those trained using LBP-added images was Xception, with a test accuracy of 60.80%  (an improvement of 15.6%), whereas ResNet152 underperformed with a test accuracy of 42.40%.Figure 11 and Fig. 12 compare the performance of transfer learned DCNNs on contrast-enhanced images and LBP preprocessed images respectively.Table 8 compares the improvement without and with image enhancement on SDNET images.Highlighted improvements include those in recall, precision, and F1 score.It is evident from the table that contrast enhancement improved the performance of most of the deep CNN architecture under consideration for crack detection since the enhanced images were able to highlight the cracked regions better than that of normal images.

Experiment 3: Crack detection using ML models based on deep features from DCNNs
Deep features extracted from the final fully connected layers of DCNNs and Support Vector Machine (SVM) are employed in this subsection to categorize the images into crack and non-crack classes.SVM is the most appropriate model to handle datasets with fewer samples of high-dimensional features because the deep features extracted from the fully connected layers of DCNNs will be high-dimension in nature 62 .Deep features and SVM increased the overall accuracy of the models for classification as tabulated in Table 9.From Fig. 13, it is understood that the MobileNet produced an accuracy of 83.16% (best model) on the SDNET dataset with deep features and SVM, while VGG16 has an accuracy of 77.16%.All the 12 deep CNN models were able to achieve  www.nature.com/scientificreports/an accuracy greater than 99% on the CCIC dataset which is shown in Fig. 14.The models VGG16, ResNet152, MobileNet, MobileNetV2, and EfficientNetB0 continue to be the most accurate in this category with a 99.83% accuracy.Among the aforementioned top 5 DCNNs in terms of accuracy, MobileNetV2 has the fewest training parameters (2,223,872).From the observations, it can be inferred that MobileNetV2 demonstrated the optimum trade-off between accuracy and trainable parameters on the CCIC dataset.From Fig. 15, it is observed that the best models on the BCD dataset are ResNet101 and EfficientNetB0, both of which have an accuracy of 99.83%.EfficientNetB0 is preferred over ResNet101 as it has a smaller number of trainable parameters-nearly ten times fewer.
Table 9 compares the improved ML models based on deep features for all three datasets.MobileNet was the model that performed the best among the SDNET models, which witnessed an increase in accuracy of between 20 to 30%.For the CCIC dataset accuracy enhancement is 10% and for the BCD dataset 11%.

Overall inference
The proposed study employed 12 pre-trained CNN models to get the best performance for identifying crack and non-crack surfaces.InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, DenseNet121, and Efficient-NetB0 deep models were used to assess the performance of deep feature extraction and transfer learning.It can be shown from the 3 datasets (SDNET, CCIC, and BCD) that the models did exceptionally well on the CCIC and BCD datasets.This is because each of these datasets has a consistent dataset with just one type of surface.The SDNET dataset, on the other hand, has many cracks and non-crack images of various surfaces.This makes it challenging for the models to achieve the necessary accuracy on the SDNET dataset.Transfer learning models  performed well on the SDNET dataset, with ResNet101 outperforming the others.ResNet50 and EfficientNetB0 were the best-performing models on the CCIC and BCD datasets, respectively.Even though some models did well on the CCIC and BCD datasets, their accuracy could yet be improved.The findings of the following experiment, in which deep features were extracted and SVM was used to classify data, were better than those of the prior one.MobileNet was the model that performed the best among the SDNET models, which witnessed an increase in accuracy of between 20 to 30%.On the CCIC and BCD datasets, each model's accuracy was close to 99%.The model's accuracy significantly improves when extracted deep features are fed to the SVM classifier.
The accuracy of models on the SDNET dataset could yet be increased.In other words, performance measures were assessed after all of these models underwent training using images that had previously experienced some processing.While the texture operator LBP did not significantly affect model accuracy, increasing contrast proved to be a helpful pre-processing strategy that led to greater accuracy.This experiment outperformed the prior transfer learning models, but not the accuracy attained by deep features fed into the SVM classifier.From Table 10 and Table 11 it is inferred that, out of the three experiments, classifying images as crack or non-crack using deep features provided to the SVM classifier was successful and produced superior accuracies across all datasets (SDNET: MobileNet; CCIC: MobileNetV2; BCD: EfficientNetB0).

Conclusion and future scope
The proposed study compared the effectiveness of Deep Convolutional Neural Networks as a classifier and as a feature extractor for crack detection.
• The performance of 12 different transfer-learned DCNN models for crack detection was evaluated and ana- lyzed on three publicly available datasets: SDNET, CCIC and BCD.The effectiveness of image enhancement and deep features extracted from the final fully connected layers of CNN models for classification was also analyzed in terms of benchmark evaluation metrics.• ResNet101(Accuracy: 53.40%), EfficientNetB0 (Accuracy:98.8%)and ResNet50(Accuracy:99.8%) produced best accuracy with normal images from SDNET, BCD and CCIC dataset respectively.Since the effectiveness of transfer learned deep models were minimal on the SDNET images, two image enhancement methods (contrast enhancement and Local Binary Pattern) were employed on the images.• The experimental results show that the enhanced images improved the accuracy of transfer-learned crack detection models significantly.• The effectiveness of Deep features extracted from the final fully connected layers of DCNNs was analyzed in terms of classification accuracy.The extracted deep feature was fed into SVM for classification and the analysis in terms of accuracy, precision, recall, and F1-score revealed that the integration of deep features with SVM improved the detection accuracy across all the DCNN-dataset combinations.• Among the SDNET models, MobileNet was the finest model, with an improvement in accuracy of between 20 and 30%.Each model's accuracy on the CCIC and BCD datasets was close to 99% for MobileNetV2 and EfficientNetB0 respectively.
The main takeaway is that we can enhance the efficiency, accuracy and decision-making processes in civil engineering applications using these models.By using ML/DL models, the task of structural health monitoring   www.nature.com/scientificreports/becomes so easy and efficient.It identifies potential structural issues in early stages, contributing to faster maintenance and better safety.A custom ensemble model by combining the best DCNNs for crack detection could be considered as the future scope of this study.There has been substantial research to deal with problems like security 63 and resource allocation 64 with ML and DL models.As a future scope, with enough models to accurately detect cracks we can form so many use cases to bring it to the consumers.
14:14517 | https://doi.org/10.1038/s41598-024-63767-5 , ResNet50, ResNet101, ResNet152 48 , InceptionV3 49 , InceptionResNetV2 50 , MobileNet 51 , MobileNetV2 52 , DenseNet121 53 , EfficientNetB0 54 are some of the well-known and leading-edge DCNN architectures for classification.DCNN varieties for classification, segmentation, or localization can be used to detect cracks in the input image.This paper proposes transfer learning-based DL models for crack identification through classification.This work carried out three experiments: (1) Transfer Learning for Crack Detection Without Image Enhancement (2) Transfer Learning for Crack Detection with Image Enhancement (3) Crack detection using SVM on deep features.Figure 2 depicts the experiments carried out in the proposed model.

Figure 2 .
Figure 2. Proposed Transfer Learning Architecture for Crack Detection with Pre-trained CNN Models on ImageNet Weights.

Figure 3 .
Figure 3. Transfer Learning Pipeline used in the proposed model.

Algorithm 1 :
Contrast Enhancement Algorithm 1 Take an input image, brightness value and contrast value 2 Check if brightness is equal to 0, if yes go to step 3 else go to step 5

Figure 4 . 7 5
Figure 4. Contrast enhancement on the crack images.(a) Original image (b) Histograms of original images (c) Contrast-enhanced image (d) Histograms of Contrast-enhanced image.

Figure 6 .
Figure 6.Crack Detection using ML models Based on Deep Features from DCNNs pipeline.

Figure 7 .
Figure 7. Sample crack and non-crack images from the three datasets.(a) Crack images from SDNET (b) noncrack images from SDNET (c) Crack images from CCIC (d) non-crack images from CCIC (e) Crack images from BCD (f) non-crack images from BCD.

Figure 8 .
Figure 8. Performance comparison of transfer learned DCNNs on SDNET without image pre-processing.

Figure 10 .
Figure 10.Performance of transfer learned DCNNS on BCD without image pre-processing.

Figure 11 .
Figure 11.Performance of transfer learned DCNNS on SDNET with Contrast enhancement.

Figure 12 .
Figure 12.Performance of transfer learned DCNNS on SDNET with LBP pre-processing.

Figure 13 .
Figure 13.Performance of SDNET with deep features.

Figure 14 .
Figure 14.Performance of CCIC with deep features.

Figure 15 .
Figure 15.Performance of BCD with deep features.

Table 1 .
SDNET dataset.The CCIC dataset includes images of concrete cracks and non-cracks.It includes more than 40,000 pictures gathered from different METU campus buildings.This dataset is balanced with only one type of surface concrete.It has 20,000 images in each class, crack and non-crack respectively.The images are of size 227 × 227.Over 6070 images of cracked and uncracked bridge surfaces are included in the Bridge Crack Dataset (BCD).The crack images were captured using the Phantom 4 Pro's 1024 1024 CMOS surface array camera.The images were later reduced to 224 × 224 dimensions to create the dataset.This dataset contains 4056 cracked images and Vol.:(0123456789) Scientific Reports | (2024) 14:14517 | https://doi.org/10.1038/s41598-024-63767-5www.nature.com/scientificreports/CCIC dataset

Table 2 .
Crack and Non-Crack image distribution in SDNET2018, CCIC and BCD datasets.

Table 7 .
Summary of performance comparison of transfer learning for crack detection in terms of precision, recall and F1 score on three datasets.

Table 8 .
Comparison of transfer learned models with/without image enhancement on SDNET dataset in terms of precision, recall and F1 score.

Table 9 .
Comparison of ML models based on deep features in terms of accuracy.

Table 10 .
Best model for all three datasets over different experiments.

Table 11 .
Maximum accuracy for all three datasets over different experiments.